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Background 

Field of the Invention 

This invention pertains in general to telephony and televideo conferencing and in 
particular to performing acoustic echo cancellation on potentially distorted audio signals. 

Background Art 

Two-way audio communications systems, such as speakerphones and video 
communications systems having audio capabilities, utilize both a microphone and a 
loudspeaker. The microphone transmits speech and other sounds from the local terminal 
to remote terminals while the loudspeaker emits sounds received from remote terminals. 
In a typical hands-free system, the loudspeaker arid microphone are located in close 
proximity and sounds produced by the loudspeaker are picked up by the microphone. 
Without signal processing, therefore, a feedback loop is easily created between the 
loudspeaker and microphone. This feedback can cause the loudspeaker to emit an 
undesirable "howling" noise and cause the remote terminals to hear echoes. 

One simple technique for eliminating feedback is to provide half-duplex switching 
where only the microphone or the loudspeaker is active at any given instant. In a typical 
half-duplex system, the loudspeaker is active until a sound is detected at the microphone. 
Then, the loudspeaker becomes inactive and the microphone becomes active for the 
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duration of the sound. Half-duplex systems have many inherent problems, not the least of 
which is that a slight noise may unintentionally cause the loudspeaker to cut out. As a 
result, it is often difficult to conduct a normal conversation with a system using half- 
duplex switching. 

5 More sophisticated audio communications systems use acoustic echo cancellation 

(AEC) to reduce echoes and eliminate howling. An AEC system typically utilizes a 
sample-by-sample copy of the signal going to the loudspeaker as the basis for an estimate 
of the echo returning through the microphone, as taught in U.S. Patent No. 4,965,822, 
entitled FULL DUPLEX SPEAKERPHONE, which issued on October 23, 1990 and is 

10 incorporated by reference herein. This estimated echo is subtracted on a sample-by- 
sample basis in an attempt to separate out only that portion of the microphone signal due 
to sounds coming from sources other than the speaker. An adaptive AEC uses a filter 
having slowly adjusted weights to form the echo estimate in an effort to more accurately 
subtract the echo from the returned audio signal. Subsequent conditioning performed on 

15 the output of the AEC may include automatic gain control (AGC) and perceived noise 
reduction. 

A problem with the above approach is that the loudspeakers do not produce sound 
pressure signals that are exactly proportional to the driving voltage (or current). 
Likewise, microphones are imperfect in an analogous sense. There may also be other 
20 sources of distortion within the sound system, such as amplifiers, analog-to-digital (A/D) 
and digital-to-analog (D/A) converters, and perhaps even the user's environment. 
Existing AEC systems do not accurately remove the nonlinear components of the returned 
signal due to these sources of potential distortion. As a result, a badly distorted form of 
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the echo can pass through the echo cancellation process. Another undesirable effect of 
these introduced distortions is that the adaptation of the AEC parameters is degraded, 
leading to a greater perceived echo. 

One potential solution to the problem of degraded AEC adaptation is to use a 
5 reduced adaptation rate during periods of very loud sound output. This technique is used, 
for example, in U.S. Application No. 09/XXX,XXX, entitled APPARATUS AND 
METHOD FOR CONTROLLING AN ACOUSTIC ECHO CANCELER, filed on 
January 20, 2000, and incorporated by reference herein. However, reducing the 
adaptation rate has the undesirable effect of slowing the system's response to a changing 
10 acoustic environment such as when users are in motion and/or the room temperature 
fluctuates. 

Another potential solution is to use higher quality loudspeakers and other 
components. This solution, however, carries with it considerable expense and places 
severe limitations on the designs of the equipment. High-quality loudspeakers are 

15 typically large and heavy and generate strong external magnetic fields. Often, the audio 
communications system is integrated into another sound system, such as the audio 
subsystem of a laptop computer, where a high-quality loudspeaker cannot be used. 

Therefore, there is a need for a technique for more accurately estimating the echo 
when performing acoustic echo cancellation. There is also a need for a technique for 

20 more accurately adapting the estimated echo in response to changing acoustic 
characteristics. 
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Disclosure of the Invention 


The above needs are met by using modules to estimate the nonlinear distortions in 
the audio signal returned from the microphone that were introduced by the loudspeaker, 
microphone, and related components. 

A typical audio communications system has a plurality of terminals coupled to a 
switch. The terminals can include, for example, dedicated speakerphones, desktop 
handsets with or without speakerphone capabilities, cellular phones, and/or personal 
computer (PC) systems with audio capabilities. The switch may be dedicated to audio 
communications, as is a private branch exchange (PBX), or distributed and 
multifunctional, as is an Internet server. 

Each terminal preferably includes a microphone and a loudspeaker. An amplifier 
amplifies the electrical signals produced by the microphone and provides its output to an 
A/D converter. The A/D converter outputs equivalent digital samples. The loudspeaker 
is driven by another amplifier which, in turn, is driven by the output of a D/A converter. 
The D/A converter receives digital samples representing the sound pressure waves to be 
produced by the loudspeaker. 

In order to cancel echoes of the loudspeaker picked up by the microphone, the 
audio communications system has an acoustic echo cancellation (AEC) module. The 
AEC module can be located in the terminal or elsewhere in the audio communications 
system. U.S. Patent Application No. 09/660,205, entitled COMMUNICATIONS 
SYSTEM AND METHOD UTILIZING CENTRALIZED SIGNAL PROCESSING, filed 
on September 12, 2000, and incorporated by reference herein, describes potential 
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locations of the AEC module. The AEC module preferably receives the digital signal 
sent to the loudspeaker and the digital signal received from the microphone. 

The digital loudspeaker signal is processed by an audio generation module (AGM) 
to model the substantially nonlinear distortions that can occur during the process of 
5 playing the audio signal at the loudspeaker. The AGM includes a modeling path 

comprised of one or more distortion modules. Each distortion module receives digital 
samples as input, modifies the samples to model a form of distortion, and outputs the 
modified samples. A distortion module can be adaptive or it can be partly or wholly pre- 
established. Preferably, the AGM can add or remove distortion modules from the 
10 modeling path at any time in response to characteristics of the digital samples or under 
direction from other modules. Distortions that can be modeled by the distortion modules 
in the AGM modeling path include, for example, amplifier clipping, loudspeaker voice 
coil displacement, harmonic distortion introduced by the loudspeaker, and hysterisis in an 
iron-core inductor. 

15 The AGM outputs digital sample values to an acoustic echo estimation (AEE) 

module. The AEE module preferably uses known adaptive algorithms to adapt the digital 
samples to compensate for substantially linear changes in the echo characteristics of the 
environment in which the loudspeaker and microphone are located. For example, the 
AEE module can modify the digital samples to account for changes in echo attenuation 

20 due to relocation of people in the vicinity of the microphone. 

The output of the AEE module is received by an audio sensing module (ASM). 
The ASM performs a function similar to the AGM, except that the ASM models 
substantially nonlinear distortions that occur while sensing the audio signal. Accordingly, 
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the ASM models distortions such as microphone 116 centerclipping, amplifier zero 
crossing distortion, saturation in either the microphone or the amplifier, and distortions 
introduced by the A/D converter. The output of the ASM represents the estimate of the 
echo of the loudspeaker signal in the signal received from the microphone. 

The digital samples returned from the microphone and the output of the ASM are 
received by an adder module. The adder module subtracts the estimated echo received 
from the ASM from the samples returned from the microphone, thereby removing at least 
part of the estimated echo of the loudspeaker from the microphone signal. 

Brief Description of the Drawings 

FIGURE 1 is a high-level block diagram of an audio communications system 
according to an embodiment of the present invention; 


FIGURE 2 is a block diagram illustrating various components of the audio 
communications system including an acoustic echo cancellation (AEC) module; and 

FIGURE 3 is a lower-level view of an exemplary audio generation module in the 
AEC module. 


FIG. 1 is a high-level block diagram of an audio communications system 100 
according to an embodiment of the present invention. A plurality of terminals 1 10A-D 
are coupled to a switch 1 12 via communications links 1 14A-D. The terminal types can 
be heterogeneous or homogeneous. In one embodiment, the terminals include: dedicated 
speakerphones, desktop handsets with or without speakerphone capabilities, cellular 
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phones, and/or personal computer (PC) systems with audio capabilities. As used herein, 
the phrase "audio communications system" also includes video conferencing systems 
having audio capabilities. Each terminal 1 10, of which terminal 1 1 OA is representative, 
preferably includes a microphone 1 16A and a loudspeaker 1 18 A. As is known in the art, 
5 the microphone 116 converts sound pressure waves into electrical signals and the 
loudspeaker 118 converts electrical signals into sound pressure waves. 

The communications links 1 14 carry audio data representative of sounds picked 
up by the microphone 1 16 and to be played by the loudspeaker 118 to/from the switch 
1 12. The communications links 1 14 may be wired or wireless. Moreover, the links 114 
10 may include dedicated private links, shared links utilizing a publicly-accessible telephone 
network, and/or links using a public or private data communications network such as the 
Internet. Data traveling over the links 1 14 may pass through one or more switches or link 
types before reaching the switch 1 12 or terminal 1 10, although a preferred embodiment of 
the present invention treats a link passing through multiple links and switches as a single 
■I 15 logical link. The data carried by the communications links 1 14 can be digital and/or 
* analog. If the data is digital, it is preferably transmitted as a series of discrete data 

packets, such as Internet protocol (IP) packets. In one embodiment, the digital data is 
encoded into a compressed format. 

The switch 1 12 switches and routes communications among the terminals. The 
20 switch 1 12 can be, for example, a private branch exchange (PBX) located at a business or 
other entity, a publicly-accessible switch operated by a telephone company or other entity 
providing audio communications, or an Internet server supporting Internet telephony. 
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Thus, the term "switch" includes any device or combination of devices capable of 
providing the switching and other functionality attributed to the switch herein. 

In one embodiment, the terminals 110 and/or switch 112 have one or more of the 
components found in a typical computer system, including a processing unit, random 
5 access memory (RAM), read-only memory (ROM), a storage device such as a hard drive, 
and/or other hardware and software for providing the functionality described herein. 
Aggregations of machine-executable code, data, circuitry, and/or data storage areas for 
performing a specific purpose or purposes are referred to as "modules." Different 
modules may share common code, data, and/or circuitry. The modules include, for 

Q 

10 example, signal processing modules, digital-to-analog (D/A) and analog-to-digital (A/D) 
*B converter modules, and amplifier modules. Modules may hold in their storage areas 

previous values of signals and current statistics derived therefrom. Modules can also use 
T adaptive techniques, or training, to perform the modules' functionalities. As used herein, 

□ the terms "adaptation" and "training" are interchangeable and refer to acting on a signal 

FU 

N 15 responsive to previous values of that signal or other signals, statistics derived from the 

p ' 

signals, and/or external controls or sensors. 

FIG. 2 is a block diagram illustrating various components of the audio 
communications system including an acoustic echo cancellation (AEC) module 210. 
FIG. 2 illustrates the microphone 116 of FIG. 1 having its output coupled to an amplifier 
20 212. As is known in the art, the microphone 116 converts sound pressure waves into 
electrical signals. The amplifier 212 amplifies the electrical signals and provides its 
output to an A/D converter 214. The A/D converter 214 outputs digital sample values 
representative of the sound pressure waves to the AEC module 210. 
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FIG. 2 also illustrates the speaker 118 of FIG. 1. The speaker 118 generates sound 
pressure waves in response to an input received from an amplifier 216. The amplifier 
216, in turn, is driven by the output of a D/A converter 218. The D/A converter 218 
receives digital sample values representing the sound pressure waves as input from the 
5 link 1 14 or another source. 

In general, the AEC module 210 estimates the echo of sounds played by the 
loudspeaker 118 that are picked up by the microphone 1 16, subtracts the estimated echo 
from the microphone's audio signal, and outputs the resulting echo-cancelled signal. In 
one embodiment, the AEC module 210 is located in the terminal 110. Accordingly, the 

[3 

in 10 output of the AEC module 210 is passed over the communications links 1 14 to the switch 
1 12. In alternative embodiments, the AEC module 210 is located in the switch 1 12 or 
anywhere else that echo cancellation is desired and representations of the loudspeaker and 
microphone signals are available. 

Turning to the AEC module 210 itself, the digital samples representing the audio 
1 5 signal sensed by the microphone output by the A/D converter 2 1 4 are received by an 
* 3 adder module 220. The adder module 220 also receives an input 222 providing digital 

samples representing the echo from the loudspeaker 118 estimated to be present in the 
microphone signal. The adder module 220 preferably adds the negative of the estimated 
echo to the signal received from the A/D converter 214. Preferably, the adder module 
20 220 works on a sample-by-sample basis. In one embodiment, both the estimated echo 
samples received from the input 222 and the sample values received from the A/D 
converter 214 bear sequencing information that the adder module 220 uses to match the 
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samples. U.S. Patent Application No. 09/660,205, incorporated by reference herein, 
discloses additional details related to the sequencing information. 

The output of the adder module 220 is passed to a perceived noise reduction 
module 224. This module 224 preferable reduces perceived noise in the audio signal. 
5 Techniques for reducing perceived noise are well known in the art. 

The output of the perceived noise reduction module is preferably passed to an 
automatic gain control (AGC) module 226. As is known in the art, the AGC module 226 
preferably isolates times during which local speech is thought to be present in the input 
signal and adjusts the signal gain so that the speech is near a predetermined level when 
10 considered on average. The AGC module 226 can use adaptive techniques to perform 
AGC. The output 228 of the AGC module 226 is preferably provided to the switch 1 12 
via the communications links 1 14 as described above. 

The AEC module 210 also receives an input 230 carrying digital sample values 
representing the audio signal being sent to the loudspeaker 1 18 of the terminal 110. If the 

ru 

H 1 5 AEC module 2 1 0 is located in the terminal 1 1 0, then this input 230 is received from the 
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switch 1 12 via the communications links 1 14. The loudspeaker 1 1 8 digital sample values 
are received by an audio generation module (AGM) 232 within the AEC module 210. 

The AGM 232 preferably modifies the digital sample values to model 
substantially nonlinear distortions that can occur during the process of generating the 
20 audio signal. FIG. 3 is a block diagram illustrating a lower-level view of the AGM 232 
according to an exemplary embodiment of the present invention. The AGM 232 includes 
a modeling path 310 comprised of logical interconnects among one or more distortion 
modules 312 that operate on the digital samples traveling through the path. Each 
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distortion module 312 receives digital samples as input, modifies the samples to model a 
form of distortion, and outputs the modified samples. Preferably, the AGM 232 can add 
or remove distortion modules from the modeling path 3 10 at any time in response to 
characteristics of the digital samples or under direction from other modules. 

The AGM 232 preferably models effects which are substantially nonlinear. 
Certain embodiments utilize artificial neural networks (ANNs) to achieve adaptation. 
Those ANNs which are not adaptive may be present at the time of manufacture and do 
not require feedback for further adaptation. ANNs in adaptive modules 312 utilize 
internal and/or external feedback. Such feedback may be from other distortion modules 
312, from the loudspeaker digital signal, and/or from the microphone signal before or 
after the adder module 220. These many possible feedback paths have been omitted from 
the modeling path 310 in FIG. 3 in order to clarify the teachings of the present invention. 

The example of a modeling path 310 illustrated in the AGM 232 of FIG. 3 has 
three distortion modules 312A, 312B, 312C arranged in sequence. Each distortion 
module 312 preferably contains a filter or other operator that acts on the input samples. 
The module 312 can be adaptive or it can be partly or wholly pre-established. Likewise, 
the module 3 12 can operate in the time or frequency domains. The module 312 can also 
act in response to short or long-term signal characteristics to model effects such as heat 
build-up. 

Preferably, each distortion module 312 independently models a form of distortion. 
In FIG. 3, the first distortion module 312A models amplifier clipping by enforcing a hard 
limit on signal amplitudes. Thus, this distortion module 312A models the effects of the 
speaker amplifier 216 in the terminal 1 10 on the analog signal sent to the loudspeaker 
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118. The second distortion module 312B models loudspeaker 118 voice coil (or 
equivalent structure) displacement. In one embodiment, this distortion module 312B 
estimates the nonlinear relationship between the voice coil displacement and the driving 
current. In one embodiment, the driving current estimate received by the voice coil 
displacement module 312B is generated by the amplifier clipping module 312A and so 
may be a nonlinear representation of the loudspeaker digital samples. The third distortion 
module 31 2C models harmonic distortion introduced by the loudspeaker 1 18. In one 
embodiment, this distortion module 31 2C applies harmonic distortion with a strength 
modulated by the energy in the spectral components subject to the distortion. Thus, this 
distortion module 312C mimics the operation of a loudspeaker 118 driven with high 
electrical amplitudes where diaphragms distort and resonate. Other distortion modules 
312 that may be utilized in the modeling path 310 include modules that account for 
distortions introduced by the D/A conversion module 218 and modules that account for 
hysterisis in iron core inductors. 

In one embodiment of the present invention, the distortion modules 312 are 
tailored to model the distortions introduced by specific types of hardware. For example, 
if the AGM 232 is located in the terminal 1 10, the amplifier clipping 31 OA and voice coil 
displacement 31 OB modules can be specifically tailored for the amplifiers and voice coils 
included in the terminal 110. 

The AGM 232 outputs digital sample values representing the distorted audio 
signal to an acoustic echo estimation (AEE) module 234. The AEE module 234 
preferably uses adaptive algorithms to adapt the digital samples to compensate for 
substantially linear changes in the echo characteristics of the environment in which the 
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loudspeaker 1 18 and microphone 1 16 are located. For example, the AEE module 234 can 
modify the digital samples to account for changes in echo attenuation due to relocation of 
people in the vicinity of the microphone 1 16. 

The digital sample values output by the AEE module 234 are preferably received 
by an audio sensing module (ASM) 236. The ASM 236 preferably modifies the digital 
sample values to model distortions that can occur in the process of sensing the audio 
signal. Like the AGM 232, the ASM 236 preferably includes a modeling path comprised 
of logical interconnects among one or more distortion modules. The modeling path for 
the ASM 236 is not shown in the figures because it would be redundant in view of FIG. 3. 
Also like the AGM 232, the ASM 236 preferably models substantially nonlinear 
distortions. Unlike the AGM 232, the ASM preferably models distortions such as 
microphone 116 centerclipping, amplifier zero crossing distortion, saturation in either the 
microphone or the amplifier, and/or distortions introduced by the A/D converter 214. The 
output of the ASM 236 is provided to the adder module 220 and becomes the input signal 
representing the echo from the loudspeaker 118 estimated to be present in the microphone 
signal described above. 

Accordingly, the AEC module 210 of the present invention accurately models the 
effects of distortion on the audio signals. The modeled types of distortion include 
nonlinear distortions introduced while generating and sensing the audio signal and linear 
echoes introduced responsive to room characteristics. This distortion modeling enables 
the AEC to more accurately cancel the echo in the signal received from the microphone 
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The above description is included to illustrate the operation of the preferred 
embodiments and is not meant to limit the scope of the invention. The scope of the 
invention is to be limited only by the following claims. From the above discussion, many 
variations will be apparent to one skilled in the relevant art that would yet be 
encompassed by the spirit and scope of the invention. 
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